Using R to visualise impact of Covid-19 on stock price of top 40 companies in Singapore
In this take home exercise the task will be to visualize the impact of Covid-19 on stock prices of top 40 companies in terms of market capitalization. For this task stock prices between 01-January-2020 to 31-December 2021 will be used.
The challenge of this exercise would be to represent the 40 stocks in a concise manner such that the 40 companies could be seen in one view, yet not hide patterns over the required time period.
The proposed solution is a horizon plot to show the top 40 stocks in terms of market capitalization. With the aim to visualize the effects of Covid-19 on stock prices, the colour scheme chosen such that falls in stock prices could be picked out by the reader.
For this task before visualization, the data would have to be scraped from a financial news portal. Thus, the following r packages are loaded:
packages = c('tidyquant','rmarkdown','tidyverse','rmarkdown','ggHoriPlot', 'ggthemes')
for (p in packages) {
if(!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
To scrape the data from a financial new portal, the ticker symbol of each stock would be used to retrieve the specified stock data. The list of top stocks in terms of market capitalization is first downloaded from here. The file is stored as a csv file. In the code chunk below, read_csv function from tidyverse is used to read the data into tibble object name ‘stocks’. The spreadsheet is downloaded on 23-Feb-2022 and order of the stocks in terms of market capitalization may have changed since then.
stocks <- read_csv("data/companiesmarketcap.com - Largest companies in Singapore by market capitalization.csv")
The market spreadsheet contains information such as market cap, country of origin and price at the time of download. For scraping stock price information only the symbol of the stock is required. In the step below, the symbol column is selected out of the table. By default top 41 companies is downloaded in the list. The slice_head function is used to filter to only top 40. The output of the select function is a one column series object. The pull function is used to convert this into a vector such that it could be later using in a for loop to compile the stock data.
stock_list <- stocks %>% select(Symbol) %>% slice_head(n = 40) %>% pull(Symbol)
The code chunk below is used to compile the stock data that is scraped from Yahoo finance website with functions from tidyquant package. A for loop is used to iteratively scrape each stock’s data using the ticker symbol.
To do so an empty dataframe is first initialized and in subsequent steps each retrieved stock information is appended to it to collate all the data. The time period for stock data is defined, using from_date and to_date variables. While the granularity of the data is defined by period_type. In this case ‘days’ is used.
Using a for loop, the symbol for each of the 40 stocks is iterated and used to retrieve stock data from Yahoo finance using tq_get function. The retrieve data is store temporarily in ‘stock_data_daily’ data table. To separate the appended data, the stock symbol is added to stock_data_daily before using a bind_rows function to append the data.
stocks_data = data.frame()
from_date <- "2020-01-01"
to_date <- "2021-12-31"
period_type <- "days"
for (i in stock_list) {
stock_selected <- i
stock_data_daily <- tq_get(stock_selected,
get = "stock.prices",
from = from_date,
to = to_date) %>% tq_transmute(select = NULL,
mutate_fun = to.period,
period = period_type)
stock_data_daily <- stock_data_daily %>% add_column(Symbol = i)
stocks_data <- bind_rows(stocks_data, stock_data_daily)
}
To make the chart more readable, the retrieved price data is merged with the top 40 stock list to have full stock names in the code chuck below using inner_join function.
stocks_data_complete <- stocks_data %>% inner_join(stocks, c("Symbol" = "Symbol"))
Unused columns that are not required for the visualization is removed in the step below.
stocks_data_complete <- stocks_data_complete %>% select(!c(Rank, marketcap, `price (USD)`, country))
To preserve the order of the stocks in terms of market cap, the names of stock is converted from a character type to factor type column below.
stocks_data_complete$Name <-factor(stocks_data_complete$Name, levels = stocks$Name)
Lastly in the step below the summary function is used to inspect the stocks data. Referencing the length of symbol column is seems that some of the stocks may not have data over a full two year period. Calculating weekdays in a 2 year period for 40 stocks without subtracting public holidays should expect slightly less than 20800 rows of data (52 * 5 * 2 * 40). With a length of 18950 rows in the symbol column could suggest no data for some days of the year for some stocks. This could be inspected in the visualization.
summary(stocks_data_complete)
date open high
Min. :2020-01-02 Min. : 0.080 Min. : 0.080
1st Qu.:2020-07-13 1st Qu.: 1.600 1st Qu.: 1.620
Median :2021-01-14 Median : 2.983 Median : 3.013
Mean :2021-01-09 Mean : 13.975 Mean : 14.206
3rd Qu.:2021-07-13 3rd Qu.: 12.450 3rd Qu.: 12.648
Max. :2021-12-30 Max. :370.000 Max. :372.700
low close volume
Min. : 0.08 Min. : 0.08 Min. : 0
1st Qu.: 1.58 1st Qu.: 1.60 1st Qu.: 36925
Median : 2.96 Median : 2.99 Median : 2177650
Mean : 13.72 Mean : 13.96 Mean : 5498903
3rd Qu.: 12.25 3rd Qu.: 12.42 3rd Qu.: 7487182
Max. :360.52 Max. :366.99 Max. :314380983
adjusted Symbol Name
Min. : 0.0678 Length:18950 ST Engineering: 509
1st Qu.: 1.5432 Class :character Sea (Garena) : 504
Median : 2.8600 Mode :character DBS : 504
Mean : 13.6543 OCBC Bank : 504
3rd Qu.: 12.1064 UOB : 504
Max. :366.9900 Singtel : 504
(Other) :15921
The data is visualized using geom_horizon function in the code chunk below with following specified options with reference from the sample site:
stocks_data_complete %>%
ggplot() +
geom_horizon(aes(date,
open), origin = 'min') +
scale_fill_hcl(palette = 'RdBu', reverse = F) +
facet_grid(Name~.) +
theme_few() +
theme(
panel.spacing.y=unit(0, "lines"),
strip.text.y = element_text(size = 7, angle = 0, hjust = 0),
legend.position = 'none',
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x= element_text(size = 6, angle = 0, hjust = 0.5),
panel.border = element_blank()
) +
scale_x_date(expand=c(0,0), date_breaks = "1 month", date_labels = "%b-%y") +
ggtitle('Open price for top 40 companies in terms of Martket Capitalization \nFrom 01-Jan-2021 to 31-Dec-2021') +
xlab('Month-Year')
The final plot shows the day opening price of top 40 companies over a two year period from Jan-2020 to Dec-21. Rows are arranged in terms of their market capitalization.
Scanning across all the rows two distinct points could be observed to have sharp drop in stock price across numerous of the stocks, late Mar-2020 and late Oct-2020. The worse of these periods was in March 2020 which could coinciding with general global market sentiment attributed to Covid-19 panic as numerous countries enter lock down mode and locally progressively stricter measures were implemented. Affecting almost all stocks in the list. However it seem most companies recovered shortly after in April. No obvious relation to pandemic events occurring in late October were found. Stocks that were mainly affected were banks (DBS, UOB) and property developers (City Developments, UOL group).
Mentioned in earlier section some stocks did not have full two year period data. This companies are Grab Holdings, TDCX, Karooooo and Triterras and are stocks that were recently listed on the exchange in previous two year period.